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Abstract 

Investigations of the legacy of natural selection in the human genome have 
proved particularly informative, pinpointing functionally important regions that 
have participated in our genetic adaptation to the environment. Furthermore, 
genetic dissection of the intensity and type of selection acting on human genes 
can be used to predict involvement in different forms and severities of human 
diseases. We review here the progress made in population genetics studies toward 
understanding the effects of selection, in its different forms and intensities, on 
human genome diversity. We discuss some outstanding, robust examples of genes 
and biological functions subject to strong dietary, climatic and pathogen selection 
pressures. We also explore the possible relationship between cancer and natural 
selection, a topic that has been largely neglected because cancer is generally seen 
as a late-onset disease. Finally, we discuss how the present-day incidence of some 
diseases of modern societies may represent a by-product of past adaptation to 
other selective forces and changes in lifestyle. This perspective thus illustrates the 
value of adopting a population genetics approach in delineating the biological 
mechanisms that have played a major evolutionary role in the way humans have 
genetically adapted to different environments and lifestyles over time. 



Introduction 

Our understanding of the patterns of human genome 
diversity has improved considerably over the last 10 years. 
The complete sequence of the human genome, which was 
published in 2001 (Lander et al. 2001; Venter et al. 2001), 
provided us with information about the location and geno- 
mic structure of genes, but did little to improve our under- 
standing of the genetic diversity of the human genome at 
population level. A number of international consortia have 
since investigated the genetic differences both between 
individuals from the same population and between differ- 
ent populations from around the world. The International 
HapMap project (International HapMap Consortium 
2005; Frazer et al. 2007; Altshuler et al. 2010), based on 
genotyping technologies, and, more recently, the 1000 
genomes project (1000 Genomes Project Consortium 
2010), based on next-generation sequencing, have made an 
enormous contribution to the identification and character- 



ization of different types of variation in different popula- 
tions worldwide. The HapMap project, for example, has 
cataloged both allele frequencies and levels of genetic asso- 
ciation (assessed by measuring linkage disequilibrium, LD) 
across several populations, for 3.5 million single nucleotide 
polymorphisms (SNPs) (International HapMap Consor- 
tium 2005; Frazer et al. 2007; Altshuler et al. 2010). The 
1000 Genomes project has described the location and allele 
frequencies of approximately 15 million SNPs, 1 million 
short insertions and deletions, and 20 000 structural vari- 
ants. Interestingly, the data obtained suggest that each of us 
carries 250-300 loss-of-function variants of known genes, 
and that we are heterozygous for 50-100 variants known to 
be involved in genetic disorders (1000 Genomes Project 
Consortium 2010; Abecasis et al. 2012). 

These data sets for human genetic variation in health 
have also been very useful for the interpretation and design 
of genome-wide association studies (GWAS) on many 
human diseases. By the end of 2012, more than 1500 
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GWAS had been published on more than 250 traits or 
diseases as varied as height, hair color, obesity, smoking 
behavior, schizophrenia and colorectal or bladder cancer 
(see, National Human Genome Research Institute catalog 
of published GWA studies: http://www.genome.gov/ 
26525384). These GWAS have made a major contribution 
to our understanding of the genetic basis of human pheno- 
typic variation in both health and disease, by identifying 
genes/variants associated with the variation of traits of 
interest, disease susceptibility/severity and differences in 
response to treatment. 

The evolutionary and population genetics 
approach 

Various factors shape the diversity of the human genome at 
the population level and may contribute to phenotypic var- 
iation. Mutation and recombination create and reshuffle, 
respectively, diversity within the chromosomes. Other fac- 
tors, such as demographic processes and the cultural behav- 
ior of human populations, then affect allelic frequencies 
within populations (Salem et al. 1996; Seielstad et al. 1998; 
Chaix et al. 2004, 2007; Wilder et al. 2004). Both genetic 
and archeological evidence have supported a common, 
recent origin of all humans in Africa, followed by range 
expansion and dispersal out-of-Africa (i.e., the Out-of- 
Africa hypothesis) (Lewin 1987; Quintana-Murci et al. 
1999; Cavalli-Sforza and Feldman 2003; Macaulay et al. 
2005; Mellars 2006; Fagundes et al. 2007; Laval et al. 2010). 
However, there is increasing evidence to suggest that such 
dispersals of modern humans were accompanied by some 
degree of admixture with local populations of ancient hom- 
inids. Indeed, recent estimates suggest that Neanderthals 
contributed 1-4% to modern Eurasian genomes, and Den- 
isovans contributed 4-6% to modern Melanesian genomes 
(Green et al. 2010; Reich et al. 2010) Furthermore, adap- 
tive introgression of archaic HLA haplotypes from Nean- 
derthal and Denisovan genomes have been documented 
among some modern human populations (Abi-Rached 
et al. 2011), highlighting overall that the Out-of-Africa 
hypothesis oversimplifies a much more complex scenario. 

The colonization of new geographic regions led to the 
exposure of human populations to different, changing 
environments, and these differences in climate, nutritional 
resources or pathogens acted as selective forces, to which 
human populations had to adapt if they were to survive 
(Sabeti et al. 2006; Nielsen et al. 2007; Novembre and Di 
Rienzo 2009). This brings us neatly to the concept of natu- 
ral selection, as genetic variants increasing fitness would 
have been conserved in such conditions, thereby increasing 
in frequency, whereas deleterious variants would have been 
rapidly eliminated, contributing to the process of genetic 
adaptation. 



Investigations of the legacy of natural selection events in 
the past within the human genome and of the ways in 
which these selective events have shaped current genetic 
diversity have proved particularly informative, pinpointing 
functionally important regions of the genome (Bamshad 
and Wooding 2003; Akey et al. 2004; Bustamante et al. 
2005; Sabeti et al. 2006, 2007; Voight et al. 2006; Barreiro 
et al. 2008; Pickrell et al. 2009; Altshuler et al. 2010). 
Indeed, studies of whether and how selection has targeted 
particular genes in the human species as a whole, or in spe- 
cific human populations, constitute a powerful approach 
for identifying genes that have played (and probably con- 
tinue to play) an essential biological role in our survival, 
and distinguishing between such genes and those with a 
higher degree of redundancy (Sabeti et al. 2006; Quintana- 
Murci et al. 2007; Barreiro and Quintana-Murci 2010). 
Furthermore, population genetic dissection of the intensity 
and type of selection acting on human genes facilitates the 
identification of genes likely to be involved in rare, severe 
Mendelian diseases and makes it easier to distinguish 
between these genes and those most likely to be involved in 
complex susceptibility to disease. Overall, this approach 
increases our understanding of the ways in which past 
selection events have contributed to current differences in 
resistance/susceptibility to disease (Blekhman et al. 2008). 

Natural selection acts in different forms 

Natural selection can take many different forms and act 
with different intensities. The most common type of natu- 
ral selection is probably purifying selection, also known as 
negative selection, which affects most genes, to various 
extents (Bamshad and Wooding 2003; Bustamante et al. 
2005; Nielsen et al. 2005). Purifying selection decreases the 
frequency of mutations that prove to be disadvantageous to 
carriers in a given environment, the magnitude of this 
decrease depending on the degree to which the mutation is 
deleterious. If the effects of the variant are highly deleteri- 
ous to carriers, or even lethal, the variant is purged from 
the population by strong purifying selection. The frequent 
removal of deleterious variants can also result in the occa- 
sional removal of neutral linked variation (particularly in 
low-recombining regions), a phenomenon known as back- 
ground selection. Genes subject to this type of selection gen- 
erally display a major deficit of non- synonymous 
mutations. Such genes are thus likely to play an essential 
role in the host, with few, if any, amino-acid substitutions 
tolerated. In this case, the occurrence of new amino-acid 
variants may lead to severe Mendelian diseases. Purifying 
selection is weaker for mutations with effects that are only 
mildly deleterious. This allows such mutations to accumu- 
late at the population level and to be maintained at a low 
frequency. In this case, an excess of low-frequency alleles is 
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generally observed for the gene concerned, with no addi- 
tional signature. Genes evolving under weaker evolutionary 
constraints are generally thought to be involved in less 
important, or more redundant, biological functions than 
those evolving under strong purifying selection. 

In some cases, the occurrence of new genetic variants 
may be advantageous in a given environment, increasing 
the fitness of individuals, and the frequency of such vari- 
ants may therefore rise due to positive selection. This pro- 
cess is also known as directional or Darwinian selection 
(Sabeti et al. 2006; Nielsen et al. 2007). When an advanta- 
geous mutation increases in frequency in the population as 
a result of positive selection, linked neutral variation will be 
dragged along with it — a process known as genetic hitchhik- 
ing. As a consequence, variation that is not associated with 
the selected allele is eliminated, resulting in a selective sweep 
that leads to an overall reduction of genetic diversity 
around the selected site. In such cases, several molecular 
signatures may be observed, including a skew in the distri- 
bution of allele frequencies toward an excess of both low- 
frequency alleles and high-frequency derived alleles (i.e., 
derived alleles with respect to the corresponding chimpan- 
zee alleles), and a transitory increase in the strength of link- 
age disequilibrium associated with the selected allele(s). 

Finally, several alleles may coexist at a given locus if they 
are advantageous individually or together, due to the effects 
of balancing selection (Charlesworth 2006; Hurst 2009). 
This is a general type of selective regime that favors the 
maintenance of diversity in a population. There are two 
main mechanisms by which balancing selection preserves 
polymorphism: heterozygote advantage and frequency- 
dependent selection. 

Studies of the effects of selection at the interspecies level 
(e.g., comparing the human and chimpanzee lineages) can 
improve our understanding of the genetic mechanisms and 
traits underlying speciation. In turn, studies of the effects 
of selection on specific human populations are particularly 
useful for the identification of genes/variants responsible 
for the phenotypic diversity observed in human popula- 
tions, in health (e.g., stature, skin color, hair development) 
and disease (e.g., differential susceptibility to infectious dis- 
eases or differences in response to treatment or vaccina- 
tion). 

Neutrality tests for detecting the effects of 
selection 

Each type of selection leaves a distinctive molecular signa- 
ture (e.g., nucleotide diversity, allele frequency spectrum, 
haplotype length) in the genomic concerned. Such molecu- 
lar signatures can be detected with various statistical tests 
that can be broadly subdivided into those that search for 
selection at the inter-species level (e.g., human versus 



chimpanzee) and those that focus on particular aspects of 
within-species data (e.g., within the human lineage or 
between human populations). 

Detection of selection between species 

Inter-species neutrality tests make use of data concerning 
the divergence between closely related species, such as 
humans and chimpanzees, to detect relatively ancient selec- 
tive events, such as those occurring at the time at which the 
two species considered diverged. In some cases, these tests 
also take into account data concerning polymorphism 
within species, making it possible to detect both older and 
more recent selective events. Inter-species tests include the 
traditional dN/dS test (Yang 1998), the McDonald-Kreit- 
man (MK) test (McDonald and Kreitman 1991), and its 
extension MKPRF (Sawyer and Hartl 1992; Bustamante 
et al. 2005). The dN/dS test detects selection acting on pro- 
tein-coding loci by comparing the ratio of non-synony- 
mous {d N ) to synonymous (d s ) substitutions. Under 
neutrality, synonymous and non-synonymous substitutions 
should occur at a similar rate and we would therefore 
expect dN/dS = 1. However, the negative selection of non- 
synonymous variants would result in dN/dS < 1, whereas 
the positive selection of such variants would result in dN/ 
dS > 1. In the MK test, which also considers polymorphic 
data, the two classes of mutations are assumed to be evolu- 
tionarily equivalent: patterns of polymorphism and diver- 
gence should therefore be the same for both classes of 
mutations. Non-independence between the counts of non- 
synonymous and synonymous polymorphisms and fixed 
differences (i.e., variants that reach fixation and are differ- 
ent between species) is assessed by Fisher's exact test. An 
excess of fixed differences for non-synonymous mutations 
(assumed to be subject to selection) with respect to synony- 
mous variants is typically considered indicative of adaptive 
evolution. In contrast, an excess of polymorphic non-syn- 
onymous variants may result from weak negative selection 
or local, population-specific positive selection. 

Detection of selection within and between human 
populations 

Intra-species neutrality tests focus on the level of polymor- 
phism within a single species, such as humans, and there- 
fore detect selection events that have occurred more 
recently. They can be subdivided into distinct groups, each 
focusing on different aspects of the genetic data. Allele fre- 
quency spectrum-based tests determine whether the fre- 
quency spectrum of mutations conforms to the 
expectations of the standard neutral model, and are exten- 
sively reviewed elsewhere (Kreitman 2000; Nielsen et al. 
2005). Deviations from neutrality in the distribution of 
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allele frequencies within populations can be measured with 
various tests, including Tajima's D, Fu and Li's D* and F* 
and Fay and Wu's H tests (Tajima 1989; Fu and Li 1993; 
Fay and Wu 2000). Negative values of Tajima's D and Fu 
and Li's D* and F* generally indicate an excess of rare 
alleles, consistent with the occurrence of negative or posi- 
tive selection, whereas positive values of these statistics typ- 
ically reflect an excess of alleles of intermediate frequency, 
due to balancing selection. Furthermore, Fay and Wu's H 
can be used to detected positive selection events (i.e., selec- 
tive sweep), through the demonstration of an excess of 
high-frequency derived alleles of the targeted genes (or 
linked variation). 

Other statistics, such as F ST statistics, are based on the 
level of genetic differentiation between populations histori- 
cally exposed to different selection pressures (Cavalli-Sforza 
1966; Lewontin and Krakauer 1973; Excoffier et al. 1992; 
Weir and Hill 2002). For example, geographically restricted 
positive selection tends to increase the degree of differentia- 
tion between a specific human population and other 
human populations, resulting in an increase in F sx value at 
the selected locus. Conversely, balancing selection, negative 
selection or species-wide directional selection may result in 
an F ST value lower than expected under a hypothesis of 
neutrality (Cavalli-Sforza 1966; Bamshad and Wooding 
2003). 

Finally, another group of neutrality tests focus on more 
recent positive selection events (i.e., <30 000 years), by 
examining the patterns of haplotype homozygosity (haplo- 
type length) associated with particular alleles. These tests 
include the long-range haplotype (LRH), integrated haplo- 
type score (z'HS), and LD decay (LDD) tests (Sabeti et al. 
2002; Voight et al. 2006; Tang et al. 2007). These tests are 
all based on the comparison of the population frequency of 
a given mutation with the length of the haplotypes around 
it. Under neutral evolution, new alleles take a long time to 
reach high frequencies in the population, and haplotype 
lengths around these variants decrease substantially during 
this time, due to recombination. Thus, common alleles are 
typically old and associated with short haplotypes. In con- 
trast, a variant subject to recent positive selection would be 
expected to have an unusually long haplotype for its popu- 
lation frequency, because the advantageous allele increases 
in frequency too rapidly for recombination to have a major 
effect on haplotype length. 

Pitfalls due to the mimicking effects of 
demography 

Factors other than selection, such as particular demo- 
graphic processes, may also account for deviations from 
the neutral model. Some of the tests for detecting selection 
described above, including those based on the allele 



frequency spectrum in particular, are sensitive to the con- 
founding effects of demography on genetic diversity pat- 
terns. For example, an excess of rare alleles, giving negative 
values of Tajima's D and Fu & Li's D* and F* statistics, 
may actually result from a sudden population expansion 
rather than from the effects of positive selection (Ptak and 
Przeworski 2002; Nielsen et al. 2005). Similarly, positive 
values of these tests are indicative not only of balancing 
selection but also of the presence of strong population 
structure within the study-population (i.e., the study-pop- 
ulation is indeed subdivided into different subpopula- 
tions), which may increase the proportion of alleles of 
intermediate frequency in the population (Kreitman 2000; 
Voight et al. 2005). 

However, it is possible to overcome these problems by 
applying a basic principle of population genetics: demo- 
graphic events affect the whole genome, whereas natural 
selection acts more locally and is restricted to particular 
genomic regions. Thus, when considering the impact of 
demographic factors on patterns of diversity, demographic 
models based on multiple, non-coding regions of the gen- 
ome, taking into account realistic scenarios for the demo- 
graphic history of human populations (e.g., population 
expansion, bottlenecks) can be incorporated into neutral 
expectations (Schaffner et al. 2005; Voight et al. 2005; Fag- 
undes et al. 2007; Laval et al. 2010). Similarly, empirical 
procedures can be used to compare the value of a given sta- 
tistic for the gene of interest (e.g., Tajmas's D, F S t> etc.) 
with background expectations for that statistic generated 
from genome-wide data, which should reflect neutrality. 
Thus, simulation-based or empirical procedures can be 
used to distinguish between the effects of demographic fac- 
tors and those of natural selection events targeting specific 
genomic regions, providing evidence of past adaptation to 
new climates, foods or pathogens. 

Adaptive phenotypes in human populations 

In the last few years, the accumulation of massive sets of 
genome-wide genetic variation data for diverse human 
populations, as in the HapMap and 1000 Genomes projects 
(International HapMap Consortium 2005; Frazer et al. 
2007; Altshuler et al. 2010; 1000 Genomes Project Consor- 
tium 2010), has made it possible to test blindly for the 
occurrence of selection throughout the entire genome. 
Genome-wide scans for selection have provided long lists 
of genes putatively targeted by positive selection, together 
with information about the genomic regions and biological 
functions most likely to have played a role in our adapta- 
tion to the environment (reviewed by (Akey 2009)). So far, 
candidate-gene approaches have provided the most con- 
vincing evidence for the action of natural selection on par- 
ticular genes, particularly when functional evidence is also 
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available, despite the bias inherent to such approaches due 
to the need to make assumptions at the outset concerning 
the genes likely to be subject to selection. Furthermore, one 
needs to be cautious when linking the patterns of selection 
observed with putative phenotypes, particularly when func- 
tional analyses are performed in laboratory animals, where 
the genes concerned may have very different functions with 
respect to humans. 

Adaptation to diet 

Many examples of the genetic adaptation of humans to diet 
have been described, including milk consumption, starch- 
rich diets, and bitter-taste perception (Bersaglieri et al. 
2004; Wooding et al. 2004, 2006; Balaresque et al. 2007; 
Perry et al. 2007; Tishkoff et al. 2007; Patin and Quintana- 
Murci 2008). Adaptation to milk consumption, through 
lactase persistence, is probably one of the best-known 
examples of natural selection in humans. A high concentra- 
tion of lactase ensures that lactose is digested effectively 
during the first few weeks of life. The lactase gene is then 
repressed, resulting in residual levels (5-10% those just 
after birth) of the enzyme in adults (Swallow 2003). How- 
ever, some populations, particularly those that have tradi- 
tionally practiced cattle husbandry, maintain the ability to 
digest milk into adulthood. This 'lactase persistence' trait is 
present at high frequency in European, Middle Eastern and 
pastoralist African populations (reaching frequencies of up 
to 90% in northern Europe) (Swallow 2003). 

The genetic basis of the lactase persistence is now well 
known; it is inherited as a dominant Mendelian trait in 
Europeans, with a mutation (C/T-13910) in the promoter 
of the lactase gene (LCT), increasing the expression of this 
gene (Enattah et al. 2002). The lactase-persistence -13910T 
allele displays clear signatures of recent positive selection in 
Europeans (i.e., it lies on a long-range haplotype that lar- 
gely exceeds its expected length under neutrality) (Bersagli- 
eri et al. 2004; Nielsen et al. 2005), and there is a strong 
correlation between the geographic distribution of this 
allele and historical milk consumption (Beja-Pereira et al. 
2003). Furthermore, the estimated age of expansion of this 
allele is between 5000 and 12 000 years ago (Beja-Pereira 
et al. 2003), suggesting that lactase persistence emerged in 
response to the cultural innovation of dairying associated 
with an agricultural lifestyle. 

Different results have been found among East African 
pastoralist populations that also present the lactase-persis- 
tence trait, as these populations do not present the -13910T 
allele. Instead, a different mutation in the LCT promoter 
region ( — 14010C), which also increases LCT expression, 
has been shown to be positively selected over the last 
7000 years among East African pastoralists (Tishkoff et al. 
2007). Altogether, these results demonstrate that the 



cultural trait of milk consumption has conferred a strong 
selective advantage in terms of human survival in different 
parts of the world. The most obvious selective advantage 
provided by the lactase-persistence trait is the possibility 
for lactose-tolerant individuals to access a valuable food 
source in times that were characterized by food shortage. 
An alternative, but not mutually exclusive, hypothesis is 
that given that lactose-intolerant individuals present syn- 
dromes such as water loss diarrhea, individuals harboring 
the lactase persistence-associated alleles may have had a 
strong selective advantage for survival, particularly in the 
nutritionally poor and pathogen-rich conditions of the 
past. Today, lactose-intolerance can still be a disadvanta- 
geous trait, as illustrated by the considerable mortality 
observed in non-tolerant African children following con- 
sumption of milk products from alimentary aid. 

Adaptation to climate 

Another example of genetic adaptation to changing envi- 
ronments is provided by the exposure of ancestral popula- 
tions to colder climates and lower levels of incident 
sunlight after early migrations out-of-Africa (Coop et al. 
2009). These changes in climatic conditions led to variation 
in the quantity, type, and distribution of melanin in the 
skin, resulting in various levels of skin pigmentation (Jab- 
lonski and Chaplin 2000, 2012). Darker skin was favored in 
regions of strong UV irradiation, such as the African conti- 
nent, due to the obvious protection it provides against 
photodamage (e.g., sunburn, melanoma, and basal and 
squamous cell carcinomas) (Kollias et al. 1991). Further- 
more, darker skins protect against UV-induced photolysis 
of folate, a metabolite essential for the normal development 
of the embryonic neural tube and spermatogenesis (Branda 
and Eaton 1978; Jablonski and Chaplin 2000). In turn, the 
evolution of lighter skin may reflect either a relaxation of 
functional constraints or a selective advantage for lighter 
skins, in regions of low UV radiation. There is increasing 
evidence supporting the selective hypothesis for lighter 
skin, as it allowed higher levels of vitamin D photosynthesis 
in regions with lower levels of UV irradiation (Jablonski 
and Chaplin 2000, 2012). In addition to these biological 
factors, sexual selection has been proposed as a factor fur- 
ther contributing to selection for skin color in the human 
population (Harpending 2002). 

A number of genes involved in skin pigmentation have 
been identified, with effects at various stages of the pigmen- 
tation pathway (i.e., from melanogenesis to the production 
and maintenance of melanosomes and the switch between 
eumelanin and pheomelanin production) (Shriver et al. 
2003; Lamason et al. 2005; Stokowski et al. 2007; Sulem 
et al. 2007; Han et al. 2008; McGowan et al. 2008). There 
is increasing evidence to suggest that many of these 
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pigmentation genes, such as ASIP, OCA2, SLC24A5, MATP, 
or TYR, have been subject to positive selection across the 
globe or in specific human populations (Lamason et al. 
2005; Izagirre et al. 2006; Soejima et al. 2006; Myles et al. 
2007; Norton et al. 2007; Pickrell et al. 2009). For example, 
the SLC24A5 gene, for which the A111G variant is associ- 
ated with lighter skin color, presents strong signals of posi- 
tive selection in Europeans, based on P ST values and the 
haplotype homozygosity surrounding this variant (Lama- 
son et al. 2005; Izagirre et al. 2006; Myles et al. 2007; Nor- 
ton et al. 2007; Barreiro et al. 2008; Pickrell et al. 2009). In 
this case, positive selection may have increased the fre- 
quency of this mutation to maximize cutaneous vitamin D 
synthesis in areas with lower levels of UV irradiation, such 
as Europe (Norton et al. 2007). 

Another interesting case of selection in response to envi- 
ronmental conditions is adaptation to altitude to avoid 
hypoxia, a physiological stress of the body due to the lower 
levels of oxygen at altitude. Various populations that have 
historically lived at high altitude, such as Andeans from the 
Andean Altiplano and Tibetans from the Himalayan pla- 
teau, have unique sets of physiological and morphological 
characteristics (e.g., increased respiratory and heart rate, 
changes in pulmonary artery pressure, enlarged thorax) that 
have allowed them to adapt to high-altitude conditions, 
where oxygen concentrations are about 40% lower than 
those of sea level. The genetic basis of such adaptations to 
extreme environmental conditions has started to be deci- 
phered only recently. Recent studies have provided evidence 
of positive selection targeting various genes or gene regions 
involved in oxygen metabolism and sensing in Andeans and 
Tibetans, the strongest signals of selection being observed 
in the EPAS1, EGLN1, PRKAA1, and NOS2A genes (Bigham 
et al. 2010; Yi et al. 2010). These studies have helped to 
delineate a number of functionally important loci responsi- 
ble for the genetic adaptation to high altitude. 

Adaptation to pathogen pressures 

Another important selective pressure that has confronted 
humans over time is that imposed by pathogens and infec- 
tious diseases. Indeed, pathogens have been, and still are in 
regions in which antibiotic treatment, vaccine administra- 
tion and hygiene improvements are limited, a major cause 
of human mortality, thus exerting strong selective pressure 
on the human genome (Casanova and Abel 2005; Quin- 
tana-Murci et al. 2007; Barreiro and Quintana-Murci 2010; 
Sironi and Clerici 2010). Pathogen-driven balancing selec- 
tion has been clearly demonstrated for the human leuko- 
cyte antigen (HLA) gene, the tremendous diversity of 
which is strongly correlated with residence in an area in 
which there are large numbers of pathogen species (Pru- 
gnolle et al. 2005). 



Several studies have identified multiple signatures of 
selection in response to the presence of Plasmodium para- 
sites, the agent responsible for malaria (Kwiatkowski 2005). 
For example, the high frequency of some hemoglobinopa- 
thies has been correlated with greater resistance to Plasmo- 
dium falciparum malaria: the HbS allele or 'sickle cell trait' 
is a textbook example of positive selection due to an infec- 
tious disease, increasing resistance to life-threatening forms 
of malaria by an order of magnitude when present in the 
heterozygous state (Allison 1954; Ackerman et al. 2005; 
Kwiatkowski 2005). Another example is provided by glu- 
coses-phosphate dehydrogenase (G6PD) deficiency. 
Patients with this condition have abnormally low levels of 
G6PD, which is particularly important for red blood cell 
metabolism. More than a hundred variants can lead to this 
deficiency, some of which are selected because they confer 
greater protection against falciparum or vivax malaria 
(Tishkoff et al. 2001; Saunders et al. 2002; Louicharoen 
et al. 2009). An extreme example is provided by the DARC 
gene, null alleles of which result in an absence of protein, 
preventing Plasmodium vivax from penetrating into host 
cells (Tournamille et al. 1995). Positive selection for the 
null allele has been demonstrated in sub-Saharan Africans, 
and this allele is almost fixed in some Central African pop- 
ulations, whereas it is virtually absent from populations 
originating from other parts of the world (Livingstone 
1984; Hamblin and Di Rienzo 2000; Hamblin et al. 2002). 
Plasmodium vivax, or another pathogen using the same 
mode of entry into host cells, has been identified as the 
most probable source of selective pressure in this case 
(Hamblin and Di Rienzo 2000; Hamblin et al. 2002). The 
selective pressures imposed by malaria can also act at the 
micro-geographic scale and be variable among closely 
related human populations. The case of the Fulani from 
West Africa, who present a specific resistance to malaria 
with respect to other ethnic groups living in the same area, 
is a prime example (Modiano et al. 1999). 

Multiple mutations underlying similar adaptive 
phenotypes 

It is interesting to note that many of the best-studied cases 
of genetic adaptation in humans have revealed multiple, 
independent mutations that confer an advantage to the 
same or similar selective pressures. Again, the lactase per- 
sistence trait not only constitutes one of the best docu- 
mented cases of positive selection in the human genome 
but also it represents a clear example of convergent adapta- 
tion in response to a strong selective force related to diet. 
Indeed, different lactase-persistence alleles are found in 
Europe, the Middle East, and Africa, where they have 
increased in frequency independently, as a result of strong 
positive selection (Bersaglieri et al. 2004; Tishkoff et al. 
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2007; Enattah et al. 2008; Itan et al. 2010). Even within a 
given population (e.g., East African pastoralists), multiple 
mutations have been found to explain the lactase-persis- 
tence trait. Furthermore, the estimated times at which these 
mutations began to increase in frequency differ slightly 
between Europe and Africa (Bersaglieri et al. 2004; Tishkoff 
et al. 2007), and coincide with the cultural practice of pas- 
toralism (and therefore milk consumption) in these popu- 
lations. A possible explanation for this fascinating example 
of convergent adaptation within our species is that muta- 
tion rates to adaptive alleles might be high enough to allow 
novel adaptive mutations to arise in distinct geographic 
regions before any single variant spreads globally (Coop 
et al. 2009). 

Other cases of multiple mutations in a given gene, or 
even mutations located in different genes, underlying simi- 
lar adaptive phenotypes in humans include resistance to 
malaria, adaptation to high altitude, or skin pigmentation. 
Indeed, various G6PD-deficiency alleles have been indepen- 
dently targeted by positive selection in African, Mediterra- 
nean, or South-East Asian populations, as they conferred 
higher resistance to falciparum or vivax malaria (Tishkoff 
et al. 2001; Saunders et al. 2002; Louicharoen et al. 2009). 
Likewise, different genes have been found to explain adap- 
tation to high altitude in Tibetans (EPAS1) or Andeans 
(PRKAA1, NOS2A), and for the only gene targeted by posi- 
tive selection in both populations, EGLN1, the selected 
haplotypes markedly differ between the two populations 
(Bigham et al. 2010; Yi et al. 2010). This clearly suggests 
that Tibetans and Andeans have followed different evolu- 
tionary trajectories to adapt to the same pressure, high alti- 
tude. Finally, the case of light skin in Europeans and East 
Asians provides with another example of convergent evolu- 
tion (Norton et al. 2007). Indeed, different genes, associ- 
ated with lighter skin color in these populations, have been 
independently targeted by positive selection in Europeans 
(SLC24A5, MATP, and TYR) and East Asians (ADAM17 
and ATRN) (Lamason et al. 2005; Izagirre et al. 2006; Soej- 
ima et al. 2006; Myles et al. 2007; Norton et al. 2007; Pick- 
rell et al. 2009). Although the role of sexual selection 
cannot be ruled out, these data support independent 
genetic mechanisms for evolution of skin color. 

The special case of cancer selection 

The relationship between selection and cancer remains 
blurry (Leroi et al. 2003). Some cancers affect mostly chil- 
dren, such as acute lymphoblastic leukemia (Ribera and 
Oriol 2009), and so they may select for anti-cancer adapta- 
tions that reduce the chance of death. Furthermore, 
although most cancers occur after the reproductive age, the 
variance is large, extended to much younger ages corre- 
sponding to a non-negligible fraction of individuals 



(Couch and Weber 1996). With this in mind, the possibil- 
ity that cancers exert a direct selection pressure cannot be 
ruled out. In this context, theoretical considerations predict 
that the risk of developing cancer for large and long-lived 
species should be higher, in comparison with short-lived 
organisms, but this does not seem to be the case (Caulin 
and Maley 2011). This observation, known as the Peto's 
paradox (Peto et al. 1975), has been used to suggest that 
large animals might have developed some mechanisms to 
resist to cancer (Leroi et al. 2003; Nagy et al. 2007; Caulin 
and Maley 2011), supporting the notion that cancer may 
have been indeed evolutionary counter-selected. 

The long-held view that cancers do not generally affect 
individuals of reproductive age has constituted a real obsta- 
cle to the use of the evolutionary genetics approach in this 
field. As a consequence, only a few studies have addressed 
directly the question of how selection has affected the evo- 
lution of cancer-related genes. Among them, the case of 
BRCA1 is worth mentioning, as this gene displays one of 
the strongest, replicated associations with breast and ovar- 
ian cancers (for a review, see (Lee and Boyer 2001)). Inter- 
estingly, BRCA1 has been found to evolve under purifying 
selection in humans (Pavard and Metcalf 2007). With 
respect to the related BRCA2 gene, which is strongly associ- 
ated with the same diseases, it also appears to be evolving 
under purifying selection (Bustamante et al. 2005). These 
observations are particularly interesting, as hereditary 
forms of breast cancer have a high impact in young adults 
(Hall et al. 1990). We might therefore expect genes associ- 
ated with cancers with a strong impact on young adults or 
children to be highly constrained by purifying selection. 

Signatures of positive selection have also been reported 
for certain genes known to play a direct or indirect role in 
cancer risk. Some of these studies focused on signatures of 
selection between species, to highlight adaptive selection in 
humans, as a whole, with respect to other species. For 
example, at the inter-species level, BRCA1 appears to evolve 
adaptively, with some specific codons in its DNA repair 
domain being targeted by positive selection (Huttley et al. 
2000; Fleming et al. 2003; Pavlicek et al. 2004). This find- 
ing and the above-mentioned signature of purifying selec- 
tion in humans (Pavard and Metcalf 2007) are not 
incompatible but instead they suggest that BRCA1 evolved 
adaptively before human speciation (accumulating various 
amino-acid changes), subsequently becoming strongly con- 
strained within the human lineage. 

At the intra-species level, some studies, which have made 
use of polymorphic data in humans, have found signatures 
of positive selection at some genes. For example, a genomic 
scan of 142 genes for signals of recent positive selection 
identified the PPP2R5E gene, which is involved in the nega- 
tive regulation of cell growth and division (Voight et al. 
2006; Grochola et al. 2009). Another selection scan of 132 



602 



© 2013 The Authors. Published by Blackwell Publishing Ltd 6 (2013) 596-607 



Vasseur and Quintana-Murci 

genes found strong signatures of positive or balancing 
selection in a region extending over more than 115 kb in 
European-Americans, the strongest signals being centered 
on TRPV6 (Akey et al. 2004; Stajich and Hahn 2005). This 
gene has been shown to be up-regulated in prostate cancer 
and associated to an aggressive form of the disease (Wis- 
senbach et al. 2001; Paiss et al. 2003). One of the most 
interesting examples is probably that of UGT2B4, which 
has been strongly associated with increased risk of breast 
cancer in Nigerians and, to a lesser extent, African Ameri- 
cans, and signatures of selection (balancing selection or 
recent positive selection) have been found in the upstream 
portion of the gene (Sun et al. 2011). Altogether, these 
examples suggest that there might be a direct or indirect 
relationship between cancer and selection, but the overall 
genome-wide proportion of cancer-related genes actually 
targeted by selection remains to be determined. 

Maladaptation as a consequence of past selection 

There is increasing evidence to suggest that some diseases 
of modern societies, such as obesity, hypertension, inflam- 
matory or autoimmune diseases, allergies or even cancers, 
may simply be a by-product of past adaptation to other 
selective forces and changes in lifestyle. Genetic modifica- 
tions occur more slowly than changes in lifestyle. It has 
therefore long been suggested that ancient selection events 
may have favoured variants that are no longer advanta- 
geous and may even have become detrimental in modern 
societies. 

Changes in nutritional and pathogenic pressures 
over time 

The 'thrifty gene hypothesis' was first introduced by James 
Neel (Neel 1962), who suggested that genes conferring a 
predisposition to diabetes (called 'thrifty genes') were his- 
torically advantageous but had become detrimental in the 
modern world. Some variants had been positively selected 
in the past because they favored the accumulation of larger 
amounts of fat, greatly increasing the likelihood of survival 
between famines. However, the change to a sedentary life- 
style and the increase in food abundance has increased the 
risk of developing type II diabetes in individuals carrying 
these variants today. Several studies have revealed particu- 
larly high risks of diabetes and high levels of obesity in pop- 
ulations that have recently rapidly switched to a 'Western 
lifestyle', such as the Native Americans of the United States 
(Joffe and Zimmet 1998). 

Another example of maladaptation may reflect changes 
in the selective pressures imposed by infectious diseases 
over time (Barreiro and Quintana-Murci 2010; Sironi and 
Clerici 2010). Pathogens are still present, but 
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improvements in hygiene and the use of antibiotics and 
vaccines have greatly weakened the selection pressures they 
impose, particularly in Western countries. A strong, exacer- 
bated immune response, which may have been the best way 
to survive in pathogen-rich environments in the past, 
seems to have become a burden in modern societies, as an 
overly vigorous response increases the risk of developing 
inflammatory and autoimmune diseases (Le Souef et al. 
2000; Sironi and Clerici 2010) Indeed, populations with a 
long-term tropical ancestry introduced into temperate 
regions appear to have a higher risk of developing allergic 
inflammatory diseases and asthma, as shown for African 
Americans and for Asians living in the United Kingdom 
(Gillam et al. 1989; Gold et al. 1993; Von Behren et al. 

1999) . In all these cases, ethnic background was found to 
have a greater impact than environment on differences in 
the prevalence of asthma (Gilthorpe et al. 1998; Miller 

2000) . From a population genetics standpoint, some alleles 
conferring a higher risk of inflammatory and autoimmune 
diseases have been shown to have been under strong selec- 
tive pressure in the past (Barreiro and Quintana-Murci 
2010). This observation supports the notion that the higher 
risk of developing inflammatory and autoimmune disor- 
ders may be a by-product of past selection in response to 
infectious disease (Sironi and Clerici 2010). 

Compromises during life 

Another compromise that has been suggested for adapta- 
tion is antagonistic pleiotropy. Pleiotropy is defined as the 
situation in which one gene controls more than one pheno- 
typic trait in an organism; 'antagonistic pleiotropy' is the 
term used when at least one trait is beneficial and another 
is detrimental. The antagonistic pleiotropy hypothesis was 
firstly proposed by George C. Williams in 1957 to explain 
senescence (Williams 1957). Indeed, variants that were 
advantageous early in life could become detrimental later 
on (Stearns et al. 2010), as genetic variation may affect sev- 
eral pathways at different ages. For example, high levels of 
testosterone in the bloodstream are associated with a 
greater fitness in early life, but are associated with a higher 
risk of developing prostate cancer at later stages (Gann 
et al. 1996). Another interesting case is birth weight: it has 
been suggested that selection for high birth weight, which 
is beneficial for survival early in life, is counterbalanced by 
a higher risk of various cancers later in life (Thomas et al. 
2012). 

Conclusion 

Population genetic studies have greatly improved our 
knowledge of the way in which humans have genetically 
adapted to variation in environmental pressures and 
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changes in lifestyle over time. Likewise, the investigation of 
how natural selection has driven the evolution of particular 
genes and biological functions has proven a useful tool to 
inform the relationship between genetic diversity, adaptive 
phenotypes and disease. In this context, detailed studies 
combining population genetic approaches making use of 
the massive whole-genome sequence-based data sets for 
various human populations, such as the 1000 Genomes 
Project (1000 Genomes Project Consortium 2010; Abecasis 
et al. 2012), and genome-wide association studies of multi- 
ple diseases and other traits of interest are now required. In 
particular, much more work is needed with respect to the 
increasingly recognized role of copy number variation in 
human adaptation and disease phenotypes, as rapid gains 
and losses of genomic segments can have a substantial 
impact on phenotypic variation (Feuk et al. 2006; Girirajan 
et al. 2011; Iskow et al. 2012). Similarly, we still have much 
to learn from the integration of data from population 
genetics, epigenetics and epidemiological genetics studies 
in populations with different lifestyles and modes of subsis- 
tence (e.g., agriculture, hunter-gathering, sedentary, noma- 
dic) or living in different environments (e.g., urban, rural, 
forest). Such multidisciplinary efforts are clearly required 
to clarify the relationship between natural selection and 
disease and to improve our understanding of the evolu- 
tionary mechanisms accounting for the present-day differ- 
ences in disease susceptibility, resistance or progression 
observed. Ultimately, these integrative approaches are likely 
to be essential for dissection of the contribution of geno- 
typic, epigenetic, and environmental variables to the cur- 
rent risk of many diseases, facilitating improvements in 
their diagnosis, prevention and treatment. 
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